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ABSTRACT 

There have been intense research interests in moving object index- 
ing in the past decade. However, existing work did not exploit the 
important property of skewed velocity distributions. In many real 
world scenarios, objects travel predominantly along only a few di- 
rections. Examples include vehicles on road networks, flights, peo- 
ple walking on the streets, etc. The search space for a query is heav- 
ily dependent on the velocity distribution of the objects grouped in 
the nodes of an index tree. Motivated by this observation, we pro- 
pose the velocity partitioning (VP) technique, which exploits the 
skew in velocity distribution to speed up query processing using 
moving object indexes. The VP technique first identifies the "dom- 
inant velocity axes (DVAs)" using a combination of principal com- 
ponents analysis (PCA) and fc-means clustering. Then, a moving 
object index (e.g., a TPR-tree) is created based on each DVA, using 
the DVA as an axis of the underlying coordinate system. An object 
is maintained in the index whose DVA is closest to the object's cur- 
rent moving direction. Thus, all the objects in an index are moving 
in a near 1 -dimensional space instead of a 2-dimensional space. As 
a result, the expansion of the search space with time is greatly re- 
duced, from a quadratic function of the maximum speed (of the ob- 
jects in the search range) to a near linear function of the maximum 
speed. The VP technique can be applied to a wide range of moving 
object index structures. We have implemented the VP technique on 
two representative ones, the TPR*-tree and the B^-tree. Extensive 
experiments validate that the VP technique consistently improves 
the performance of those index structures. 

1. INTRODUCTION 

GPS enabled mobile devices (phones, car navigators, etc) are 
ubiquitous these days and it is common for them to report their lo- 
cations to a server in order to get location based services. Such 
services involve querying the current or near future locations of 
the mobile devices. Many index structures have been proposed to 
facilitate efficient query processing on moving objects in the last 
decade (e.g., [8, 13, 17, 20, 21, 23, 25]). However, none of these 
index structures exploit the important property of skewed velocity 
distributions. In most real world scenarios, objects travel predomi- 
nantly along only a few directions due to the fixed underlying trav- 
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elling infrastructure or routes. Examples include vehicles on road 
networks, flights, people walking on the streets, etc. Figure 1(a) 
shows a portion of the road network of San Francisco, where most 
of the roads are along two directions. Figure 1(b) shows a sample 
of velocity distribution of the cars travelling on the San Francisco 
road network. Every point (2-dimensional vector) in the figure rep- 
resents the velocity of a car. It is clear that most of the cars are 
travelling along two dominant directions (axes). 




(a) San Francisco road network (b) Velocity distribution of the cars 



Figure 1: San Francisco road network and the cars' velocity 
distribution 

The velocity distribution of objects in an index has a great impact 
on the rate at which the query search space expands. The search 
space expansion is either due to the tree nodes' minimum bound- 
ing rectangle (MBR) expansion (e.g., the TPR-tree/TPR*-tree [21, 
23]) or query expansion (e.g., the B^-tree [13]). In either case, the 
search space for a tree node is enlarged during the query time inter- 
val using the largest speed of the objects grouped in that tree node. 
If the velocities of the objects in a node are randomly distributed, 
then the search space is enlarged along both the x- and y-axes, and 
therefore there is a quadratic function of the maximum speed of the 
objects in the node. If the movements of all the objects in a node are 
largely along the same direction, then the search space is enlarged 
mainly along one axis and hence there is close to a linear function 
of the maximum speed of the objects in the node. 

Motivated by this observation, we propose the velocity partition- 
ing (VP) technique, which exploits the skew in velocity distribution 
to speed up query processing using moving object indexes. The 
VP technique first identifies the "dominant velocity axes (DVAs)" 
using a combination of principal components analysis (PCA) and 
fc-means clustering. A DVA is an axis, which the velocities of most 
of the objects are (almost) parallel to. Then, a moving object index 
(e.g., a TPR-tree) is created based on each DVA, using the DVA as 
an axis of the underlying coordinate system. Objects are dynami- 
cally moved between DVA indexes when their movement directions 
change from one DVA to another. Objects with current velocities, 
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which are far from any DVAs, are put in an outlier index. The out- 
lier index uses the regular coordinate system. Thus, except for the 
outlier index, the objects in each other index are moving in a near 
1-dimensional space instead of a 2-dimensional space. As a result, 
the expansion of the search space with time is greatly reduced, from 
a quadratic function of the maximum speed (of the objects in the 
search range) to a near linear function of the maximum speed. 

The VP technique is a generic method and can be applied to a 
wide range of moving object index structures. In this paper, we fo- 
cus our analysis and implementation of the VP technique on the two 
most well recognized and representative moving object indexes of 
different styles, the TPR*-tree [23] and the B^-tree [13]. These two 
indexes are the basis for many recent indexing techniques [7, 22, 
24, 25]. Our method can be applied to these more recent indexes 
in similar ways to how it is applied to those two representative in- 
dexes. We perform an extensive set of experiments using various 
real and synthetic data sets. The results show that the VP tech- 
nique consistently improves the performance of both index struc- 
tures. The improvement is up to around 3 times in terms of both 
query I/O and query execution time for both index structures. 

The contributions of this paper are summarized below: 

• We analytically show why a moving object index with VP 
outperforms a moving object index without VP. 

• We propose the VP technique, which identifies the dominant 
velocity axes (DVAs) and maintain the objects in separate 
indexes based on the DVAs. 

• We analytically show how to choose the value of an impor- 
tant parameter that determines which objects belong to the 
outlier index. 

• We implemented the VP technique on two state-of-the-art 
moving object indexes, the TPR*-tree and the B^-tree. We 
have performed an extensive experimental study. The results 
validate the effectiveness of our approach across a large num- 
ber of real and synthetic data sets. 

2. PRELIMINARIES 

In this section, we provide some background on moving objects, 
and briefly review two techniques used in our approach, principal 
components analysis (PCA) and fc-means clustering. 

2. 1 Moving Obj ect Representation and Query- 
ing 

A simple way of tracking the location of moving objects is to 
take location samples periodically. However, this approach requires 
frequent location updates, which imposes a heavy workload on the 
system. A popular method to reduce the reporting rate is to use a 
linear function to describe the near future trajectory of moving ob- 
jects. The model consists of the initial location of the object and a 
velocity vector. An update is issued by the object when its velocity 
changes. An object velocity update simply consists of a deletion 
followed by an insertion. This linear model based approach is used 
by many studies [8, 13, 17, 19, 20, 21, 23, 25, 26, 28] on indexing 
and querying moving objects. We also follow this model in this 
paper, and the moving objects are modeled as moving points. 

We support three different types of range queries: time slice 
range query, which reports the objects within the query range at a 
particular time stamp; time interval range query, which reports the 
objects within the query range within a time range; moving range 
query, where the query range itself is moving and the query reports 
the objects that intersect the moving range in a time range. For all 
three types of range queries, if the query timestamp (or time range) 
is in the future, the query range is projected (expanded) to that fu- 
ture time to check which objects should be returned. 

2.2 Principal Components Analysis 

Principal components analysis (PCA) is a commonly used method 
for dimensionality reduction [4, 12] and for finding correlations 
among attributes of data [15]. It examines the variance structure 
in the data set and determines the directions along which the data 



exhibits high variance. In our case, if we map the velocity of ob- 
jects into the 2D velocity space as points, then the axis with high 
variance is the DVA. 

Given a set of fc-dimensional data points, PCA finds a ranked set 
of orthogonal fc-dimensional eigenvectors vi, «2, Vk (which we 
call principal component vectors) such that: 

• Each principal component (PC) vector is a unit vector, i.e., 

yJWx +P4 + - + = !. where Pa (*. 3 = l < 2 > -> fc ) is 
the j component of the PC vector Vi. 

• The first PC vi accounts for most of the variability in the 
data, and each succeeding component accounts for as much 
of the remaining variability as possible. 

2.3 7^-means Clustering 

if -means clustering [18] is a method commonly used to auto- 
matically partition a data set into fc clusters where each data point 
belongs to the cluster with the nearest centroid. It starts by assign- 
ing each object to one of fc clusters either randomly or using some 
heuristic method. The centroid of each cluster is computed and 
each point is re-assigned to its closest cluster centroid. When all 
points have been assigned, the fc cluster centroids are recomputed. 
The process is repeated until the centroids no longer move. 

3. RELATED WORK 

In this section, we review existing work on moving object in- 
dexes, specifically R-tree [3] based indexes, the B^-tree [13], and 
dual transform based indexes. We also discuss indexing techniques 
for handling skewed workloads and for handling moving objects on 
road networks. 



3.1 R-tree Based Moving Object Indexes 




1 23456789* 1 23456789* 



(a) MBRs and VBRs at time (b) MBRs and VBRs at time 1 
Figure 2: MBRs of a TPR-tree growing with time 

An established approach to index moving objects is to use the 
R-tree [3] or it's more optimized variant the R*-tree [11] to index 
the extents of objects and their current velocities. These indexes 
include the TPR-tree [21] and its variant TPR*-tree [23], which 
optimize some operations of the TPR-tree. They work by grouping 
object extents at the reference time into minimum bounding rect- 
angles (MBRs). Figure 2(a) shows the objects a, b and c grouped 
into the same MBR in node JVi. Accompanying the MBRs are the 
velocity bounding rectangles (VBRs), which represent the expan- 
sion of the MBRs with time according to the velocity vectors of 
the constituent objects. The rate of expansion in each direction is 
equal to the maximum velocity among the constituent objects in the 
corresponding direction. A negative velocity value implies that the 
velocity is towards the negative direction of the axis. For example, 
in Figure 2(a) we can see that the solid arrow on the left of node Ni 
has a value of -2. This is because the maximum velocity value of 
the constituent objects in the left direction is 2. Figure 2(b) shows 
the expanded MBRs at time 1. 

The MBR and VBR structure described can be extended by re- 
placing the constituent object extents with smaller MBRs. This 
when recursively applied creates a hierarchical tree structure. The 
tree structure is identical to the classic R-tree [11]. The only differ- 
ence being the algorithms used to insert, delete and query the tree 
also need to take the velocity information into consideration. The 
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TPR-tree and the TPR*-tree modify the R*-tree's insertion/deletion 
and query algorithms. 

The insertion and deletion algorithms of the TPR*-tree use a cost 
model proposed by Tao et al. [23] to reduce the expected number 
of node accesses for a range query Q. We briefly describe this 
cost model below. This cost model is also used by our paper for 
analyzing the benefits of a partitioned index in Section 4. 

Consider a moving tree node iV and a moving range query Q for 
the time interval [0,1] as shown in Figure 3(a). The MBR (VBR) 
of N is denoted as N R = {N R1 -, N R1+ , N R2 -, N R2+ } (N v = 
{N V i-,N V i+, N V 2-, N V 2+}), where N Ri - (N V i-) is the coor- 
dinate (velocity) of the lower boundary of N on the i th dimension, 
where i e {1,2}. Similarly, N Ri+ (jVy»+) refers to the upper 
boundary. MBR (VBR) of Q also can be denoted similar to N. 

The sweeping regions of N and Q are the regions swept by N 
and Q during the time interval [0,1] (the grey regions shown in 
Figure 3(a)). To determine whether node N intersects Q, we first 



y l\ .sweeping 

region of Q 

io- . \ 
sweeping \ 

g _region ofN 



Mbr(Q.O) 



-Ji 



\ Mbr(Q.l) 
Mbr(NJ) 



2" ^Mbr(N,0) 

I I I I L_ 

2 4 6 8 10 

(a) Moving node N, Q 
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(b) Transformed node N' 



Figure 3: Sweeping region of moving node 

define the transformed node N' with respect to Q as follows: the 
MBR of N' in the i th dimension is (N Ri - - \Q Ri \/2, N Ri+ + 
\Qm\/2); the VBR of N' in the i th dimension is (N vi - - Q V i+, 
Nvi+ — Qvi-)- To check whether node N intersects Q during 
the time interval [0,1] is equivalent to checking whether the trans- 
formed node N' intersects the center of Q (which is a point) during 
the time interval [0,1]. Therefore, the probability of iV intersecting 
Q (which is the probability of node TV being accessed by the query 
Q) during the time interval [0,1] is the same as the probability of 
N' intersecting the center of Q during the time interval [0,1], which 
equals to the area of the sweeping region of N' in the time inter- 
val [0,1] (the grey region shown in Figure 3(b)). Assuming that 
the MBR of Q uniformly distributes in the data space and the data 
space has a unit extent in each dimension. Adding up this proba- 
bility for every node of the tree, we obtain the expected number of 
node accesses for the range query Q as: 



E 



V N i{q T ), 



(1) 



every node N in the tree 



where qr is the query time interval; Vjv' (<1t) is the volume of the 
sweeping region of N' during qr- 

3.2 The B -tree 
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Figure 4: Query enlargement in the B^-tree 



The B^-tree [13] indexes moving objects using the B + -tree. This 
is a challenge because the B + -tree indexes ID space but objects 
move in a 2D space with associated velocities as well. The B^-tree 
achieves the challenge by first partitioning the 2D space using a 
grid, and then using a space-filing curve (Hilbert-curve or Z-curve) 
to map the location of each grid cell to a ID space where 2D prox- 
imity is approximately preserved. The locations of the moving ob- 
jects are indexed relative to a common reference time. 

The B^-tree incorporates the fact that objects are moving by en- 
larging the query window according to the maximum velocity of 
the objects. If the query time is far in the future, and therefore 
very different from the index reference time, then the query may 
be enlarged significantly. Figure 4 shows an example of how the 
window enlargement works. Supposing that the current time is 0, 
we issue a predictive time slice range query Q at time 2 (the solid 
rectangle). Considering that moving points a and b (the black dots) 
stored in the B^-tree, are indexed relative to timestamp 5. From 
their velocities as shown in Figure 4, we can infer their positions 
at timestamp 2, which are a* and b* (the circles). The window en- 
largement technique enlarges the range query Q using the reverse 
velocities of a and b to get the query window at timestamp 5 (the 
dashed rectangle). In practice, histograms on a grid base are main- 
tained for the maximum/minimum velocity of different portions of 
the data space and the query window is enlarged according to the 
maximum/minimum velocity in the region it covers. Therefore, a 
drawback of the B^-tree is that, if only a few objects have a high 
speed, they would make the enlarged query window unnecessarily 
large for most of the objects. 

To reduce the amount of query window enlargement, the B^-tree 
partitions the index into multiple time buckets, where all objects 
indexed within the same time bucket are indexed using the same 
reference time. This results in a smaller difference between the 
reference time and query time and thus reduces the query window 
enlargement. When objects are updated, they are moved from the 
time bucket they are currently residing in to the future time bucket. 

3.3 Dual Transform Based Moving Object In- 
dexes 

The earlier work on dual transform based moving object indexes 
[1, 16] was improved upon by more recent indexes such as STRIPES 
[20], the B dw -tree [25] and [17]. They index objects in the dual 
space, i.e. a 4-dimensional space consisting of two dimensions for 
the location of an object and another two dimensions for the ve- 
locity of the object. A consequence of indexing the velocity as 
separate dimensions is that the moving objects are effectively in- 
dexed as stationary objects. All objects are indexed based on the 
same reference time of 0. A drawback of indexing all objects at 
the same reference time is that the query search space continues 
to grow with time,which is overcome by periodically replacing the 
old index with a new index with an updated reference time. 

Dual transform based moving object indexes differ from our work 
by not exploiting velocity distribution skew to index objects travel- 
ing along different dominant velocity axes (DVAs) separately. 

3.4 Indexing Techniques that Handle Skewed 
Workloads 

Zhang et al. [27] propose the P + -tree, which efficiently handles 
both range and fcNN queries for different data distributions includ- 
ing skewed distributions. Their work differs from ours in that their 
index is designed for stationary objects instead of moving objects. 
Tzoumas et al. [24] propose the QU-Trade technique for indexing 
moving objects that adapts to varying query versus update distribu- 
tions by building an adaptive layer on top of the R-tree or TPR-tree. 
Our work differs from this by adapting to velocity distributions in- 
stead of query versus update distributions. Chen et al. [7] propose 
the ST B-tree, which improves the B^-tree by making it adaptive 
to data and query distribution. This is done by dynamically ad- 
justing the reference points and grid sizes. Our work differs from 
this by creating separate indexes according to velocity distributions 
instead of adjusting the reference points and grid sizes. Our VP 
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technique can be applied in a straightforward manner to the QU- 
trade technique and ST 2 B -tree because their underlying structures 
are the TPR-tree and the B^-tree, respectively. 

Dittrich et al. [8] propose a main memory indexing technique 
called MOVIES for moving objects. MOVIES assumes that the 
whole data set resides in memory and the update rate is very high 
(greater than 5,000,000 per second) whereas our technique does not 
make such assumptions. 

3.5 Indexing Techniques for Moving Objects 
on Networks 

There are many existing papers [2, 5, 9, 10] which model the 
movement of objects along any type of network including road net- 
works. Our paper does not assume that every object must move in 
a road network, in other words, our technique works for generic 
scenarios where objects can move freely. Objects moving in road 
networks is just one of the motivating examples in which case our 
technique brings great performance gain due to the few dominant 
directions of object movements. 

4. HOW VELOCITY PARTITIONING RE- 
DUCES SEARCH SPACE EXPANSION 

In this section, we analytically show how a velocity partitioned 
index can reduce the rate of search space expansion. We focus our 
analysis on the B^-tree and the TPR-tree variants. We first give 
an intuitive description of a partitioned index versus unpartitioned 
index. Second, we define search space expansion. Third, we an- 
alytically contrast the rate of search space expansion between an 
unpartitioned index versus a partitioned index. Finally, we present 
preliminary experimental verification of our analysis. 

Partitioned index. The main idea of the velocity partitioning (VP) 
technique is to index objects moving along different DVAs (direc- 
tions) in separate indexes. It is important to note that the VP tech- 
nique is not restricted to pairs of DVAs that are perpendicular to 
each other, but rather will work for any number of DVAs separated 
by any angle. Here we first use a simple example to illustrate the 
concept of the VP technique. Later in Section 5, we provide a de- 
tailed description of how the VP technique is performed. Figure 
5 shows an example of objects indexed by an unpartitioned index 
versus the same objects indexed by a partitioned index. In this ex- 
ample, objects are moving along two DVAs, the x-axis and the y- 
axis. In the unpartitioned index, all objects are indexed by the same 
index. In the partitioned index, objects moving along the x-axis are 
indexed in a separate index from those moving along the y-axis. 

Search space expansion. First, we define what we mean by search 
space expansion. The search space for a query describes the data 
space that is covered (accessed) when processing the query. The 
expansion of the search space is determined by the relative move- 
ment between the query and the tree nodes. The size of the search 
space is proportional to the number of tree nodes accessed by a 
query Q, which can be estimated using a cost model proposed by 
Tao et al. [23] for the TPR-tree/TPR*-tree. The cost model was 
described in Section 3.1 and given as Equation 1. 

Although the cost model was designed for the TPR-tree, it also 
applies to the B^-tree as follows. For the B^-tree, the query ex- 
pands but the tree nodes are stationary, which is a special case of 
the analysis used for Equation 1 where both the query and the tree 
node are moving and expanding. 

The idea behind the cost model of Equation 1 is that we can 
always transform a moving/expanding query into a stationary one 
by making relative adjustments to tree nodes. For example, an ex- 
panding query and a stationary tree node can be transformed into 
a stationary query by expanding the tree node by the amount the 
query was supposed to expand. Following this line of argument, 
we only consider the expansion of the tree node in the following 
analysis without loss of generality. 

Figure 6 shows an example of the search space of the example 
shown in Figure 5. In the example, S is the search space of the un- 
partitioned index, S' x and S'y are the search space of a partitioned 



index in the x- and j/-axes, respectively. We also assume that all 
objects are traveling either along the x- or y-axes, as was the case 
for Figure 5. The example shows that the search space expands by 
a quadratic factor for the unpartitioned index versus a linear factor 
for the partitioned index. 

Analysis of search space expansion of unpartitioned versus par- 
titioned index. We will first analyze a simplified scenario as shown 
in Figure 6, and then discuss more general situations in Section 4.1. 
In this simplified scenario, we assume that: (i) the velocities of all 
the objects are exactly along the standard x- or y-axes; (ii) the ob- 
jects travel in the same speed along all directions; (iii) the extent 
length of the tree nodes along the x- and j/-axes are the same; and 
(iv) the initial locations of objects are uniformly distributed in the 
2D space. The symbols used in Figure 6 are described as follows. 
N' is the transformed rectangle of the node N with respect to the 
query for the unpartitioned index at the initial time 0; N x and N Y 
are the transformed rectangles of the node N for the partitioned in- 
dex for the x- and jy-axes, respectively; v is the maximum speed for 
the objects in S along both the x- and j/-axes. The extent length of 
all the nodes is d. This assumption is reasonable since we are more 
interested in the rate of expansion of the search space rather than 
its initial size. 

Let S' denote the combined search space of the partitioned index 
in the x-axis, S' x and the y-axis, S Y (as shown in Figures 6(b) and 
6(c), respectively). Our aim is to show that the rate at which the 
unpartitioned search space, S expands is higher than the rate at 
which the partitioned search space S' expands. We quantify the 
search space as the volume created by integrating the search area 
from time to the query predictive time th, where query predictive 
time refers to the future time of the query. The search area expands 
with time, therefore we start by expressing the search area of the 
partitioned index N' as a function of time t, A N > (t) as follows: 

A N , (t) = (d + 2vt) (d + 2vt) 

= d 2 + Avtd + Av 2 t 2 (2) 

We are interested in the total expansion of the search area of 
the partitioned indexed including both the x-axis index and y-axis 
index. Therefore, let A C n> {t) be the combined area of N' x and 
N Y as a function of time t. Acn> (t) can be computed as follows: 

A CN ,(t) = A N , x (t) + A N{r (t) 

= (d + 2vt)d + d(d + 2vt) 

= 2d 2 + Advt (3) 

We next compute the search volume of S. It is important to 
compute the search volume rather than just the expanded search 
area since the volume includes the cumulative expansion of the area 
from time to th- We compute the search volume Vs of S by 
integrating the search area A N i from time to th as follows: 

V s (t h )= [ h A N ,(t)dt 
Jo 
r*h 

= / (d 2 +4vtd + 4v 2 t 2 )dt 
Jo 

= d 2 t h + 2dvt h 2 + -v 2 t h 3 (4) 

Similarly the search space volume from time to t h of S", Vs> 
can be computed as follows: ( 

V S '{th)= [ h A CN ,(t)dt 
Jo 

f th 9 

= / (2d 2 + 4dvt) dt 
Jo 

= 2d 2 t h + 2dvt h 2 (5) 

In order to compare the search space of the partitioned index 
versus the unpartitioned index, we compute the difference between 
the search space volume of the partitioned search space S' versus 
the unpartitioned search space S as a function of time, AV(th) as 
follows: 
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(a) Tree node of unpartitioned index (b) Tree nodes of partitioned index 

Figure 5: Objects indexed by an unpartitioned index versus the same objects indexed by a partitioned index 
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Figure 6: Search space of unpartitioned index, S versus search space of partitioned index, S' x plus S' Y 



AV(t h ) = V s ,(t h )-V s (t h ) 



2d 2 t h - 



■ 2dvti, 



(d 2 t h + 2dvt h 2 + -v 2 t h 3 ) 



(6) 



From Equation 6 we can see that as time increases the search 
volume of the unpartitioned space Vs becomes increasingly larger 
than the search volume of the partitioned space, Vs' ■ This can be 

seen by the fact AV(th) is negative when th is greater than ^j^. 

Therefore, when time th passes the 4^ threshold the search vol- 
ume of the unpartitioned search volume Vs becomes larger than the 
partitioned search volume Vs' ■ 

Next, we analyze the rate of change in the search space, by taking 
the derivative of Equation 6. This is stated as follows: 



dAV(t h ) 
dt h 



■ 4v 2 t h 2 



(7) 



Equation 7 shows that the search volume of the unpartitioned index 
expands at a much faster rate than the partitioned index. This can 
be seen by the fact the rate at which the search volume of the un- 
partitioned index increases above the partitioned index is a squared 
factor of both v and t h because dA J t ^ h ) is a squared factor of both 
v and th ■ 

The above analysis is with respect to a single node. It obviously 
applies to any node in the tree and when summing up the search 
space for all the tree nodes, we reach the conclusion that the query 
search space on a partitioned index grows much slower with time 
than the query search space on an unpartitioned index. The follow- 
ing experiment on a real data set validates this result. 




Figure 8: Chicago road network 



Experimental verification of the analysis. Figure 7 shows the 
results of an experiment, which illustrates the 2D search space ex- 
pansion for an unpartitioned TPR*-tree and an unpartitioned B x - 
tree versus a near ID search space expansion for their partitioned 
counterparts. The indexes are partitioned using our VP technique 
(detailed in Section 5). The experiment uses data generated from a 
portion of the road network of Chicago shown in Figure 8. The ex- 
periment involved 100,000 moving objects, with maximum speed 
of 100 meters per time stamp, with a query predictive time of 60 
time stamps. Details of other parameters of the experiment are the 
default parameters described in the experimental study (Section 6). 

Figures 7(a) and 7(b) show the velocity expansion rate of the leaf 
MBRs for the unpartitioned TPR*-tree and partitioned TPR*-tree, 
respectively. The results show that the leaf nodes of the unpar- 
titioned TPR*-tree expand in a 2D space whereas the partitioned 
TPR*-tree expand in a near ID space. Similarly, Figures 7(c) and 
7(d) show the query expansion rate of the unpartitioned B^-tree 
and partitioned B^-tree, respectively. Again, the query of the un- 
partitioned B^-tree expands in a 2D space, whereas the partitioned 
B^-tree expands in a near ID space. 

4.1 Discussion of General Cases 

In the analysis of the simplified scenario, we have made several 
assumptions. To lift the first assumption, when the velocities of 
objects are not exactly along the standard x- or y-axes, as long as 
their directions are close to the standard x- or y-axes, the previous 
analysis still holds since a small deviation from the dominant ve- 
locity axis (DVA) incurs a small search space expansion. However, 
if some objects' directions are not close to any of the DVAs, we 
will put these objects into an outlier partition. Details of the outlier 
partition will be discussed in Section 5.2. 

An implicit assumption we also made in the previous analysis is 
that there are two DVAs, one is vertical and the other is horizontal. 
This assumption may not hold in practice. Therefore, in our VP 
technique, we first find out the actual DVAs (through a combination 
of PCA and fc-means clustering). Then, the previous analysis still 
holds when we replace the x- and y-axes with the actual DVAs. 
Details of how to find the DVAs will be discussed in Section 5.1. 

5. THE VELOCITY PARTITIONING TECH- 
NIQUE 

We present our VP technique in this section. Figure 9 shows the 
system architecture for the VP technique. The system has two main 
components, a velocity analyzer and an index manager. The veloc- 
ity analyzer partitions a sample of the velocity of objects from the 
current workload in order to find the DVAs and an outlier threshold 
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(a) Unpartitioned TPR*-tree (b) Partitioned TPR*-tree (c) Unpartitioned B^-tree (d) Partitioned B^-tree 

Figure 7: Search space expansion of the unpartitioned versus partitioned B^-tree and TPR*-tree on the Chicago data set 



(used to determine which objects belong to the outlier partition). 
Velocity is a 2D point in the velocity space, so we refer to the ve- 
locity of an object as a velocity point. The index manager takes 
the output of the velocity analyzer to transform the query, insertion 
and deletion operations to operate on the DVA indexes and outlier 
index. A DVA index is the same as a traditional moving object in- 
dex such as the TPR-tree or the B^-tree except objects are indexed 
using a transformed coordinate space according to the DVA. The 
index manager inserts an object into the closest DVA index unless 
it is far from all DVAs, in which case, the object is inserted into 
the outlier index. If an object update causes its direction of travel 
to change sufficiently, it may be moved from one index to another. 
Processing a query involves transforming the query into the coor- 
dinate space of each index, and then querying all the indexes and 
combining the results. 
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Figure 9: The system architecture of the VP technique 

We provide a more detailed description of the velocity analyzer 
in this section since it is the key component of the system. The ve- 
locity analyzer analyzes the sample of velocity points to determine 
the partition boundaries for future object insertions and querying. 
The partition boundaries are determined by the DVAs in the data set 
and an outlier threshold r. We observe that when there are multiple 
DVAs in the data set, using only PCA may not be able to identify 
the DVAs correctly. Therefore, we propose to use a combination of 
PCA and fc-means clustering on the sample velocity points to deter- 
mine the DVAs. Here k is an input value given by the user based on 
observation of the data set or experience. For example, most road 
networks have two dominant traffic directions and we can set k to 
2. Once the DVAs are determined, the objects can be partitioned 
based on the closeness of their velocity directions to the directions 
of the DVAs. However, some velocity points may not be close to 
any DVA. Those objects are placed in an outlier partition. We de- 
termine the boundary of the outlier partition using a threshold r, 
which defines an upper bound on what a DVA partition will accept. 
We choose the r value for every partition by analyzing the sample 
data set using a search space-based cost function. 

Algorithm 1 summarizes the VP algorithm used by the velocity 
analyzer. It starts by finding the DVAs using a combination of PCA 
and fc-means clustering on the representative sample data (Line 2). 
Specifically, we integrate PCA into the clustering process itself by 
using PCA to guide the formation and refinement of clusters. At 
the end of the clustering process, each cluster contains the velocity 
points that form one DVA partition. The 1st PC of each partition is 
the DVA for the partition. The partitioning algorithm minimizes the 



perpendicular distance from each velocity point to the DVAs. The 
reason we minimize the perpendicular distance is that if all velocity 
points within one partition have a small perpendicular distance to 
the DVA, then those velocity points occupy a near ID space. 

We define a threshold r for every DVA to determine whether an 
object can be accepted to its partition (Line 4). We determine the 
optimal t by minimizing the combined rate of search area expan- 
sion of the DVA partition and the outlier partition. Objects whose 
perpendicular velocity is not within the threshold, r , of any DVA, 
are placed in the outlier partition (Line 5). Once all the outlier ve- 
locity points have been removed from the DVA partition we recom- 
pute the DVA using the remaining velocity points (Line 6). This 
updated DVA will be a more precise representation of the veloc- 
ity points now remaining in the DVA partition. The final DVAs 
and their associated r thresholds are used by the index manager for 
future insertions and query processing. 

Algorithm 1: Velocity Partitioning^, fc) 

Input: A: sample set of velocity points, k: number of DVA partitions 
Output: D: set of DVAs with associated outlier thresholds r 

1 let P be the set of k DVA partitions with their associated DVAs 

2 P = Find DVAs(A, k) II See Algorithm 2 

3 for each p S P do 

4 compute the maximum perpendicular distance threshold r for p 
according to Section 5.2 

5 move the velocity points from p whose perpendicular distance is 
greater than r from the DVA of p into the outlier partition 

6 recompute the DVA for the remaining velocity points in p 

7 let D be the set of DVAs and associated r thresholds of P 

8 return D 



In Section 5. 1 , we describe how our velocity analyzer finds DVAs. 
In Section 5.2, we describe how our velocity analyzer determines 
the threshold r to decide which objects should be placed in the 
outlier partition. In Section 5.3, we show how our index manager 
handles insertion, deletion and update operations. In Section 5.4, 
we show how our index manager performs the range query. Finally 
in Section 5.5, we discuss the issue of changing velocity distribu- 
tions. 

5.1 Velocity Analyzer: Finding Dominant Ve- 
locity Axes (DVAs) 

In this subsection, we will first examine two naive approaches to 
finding DVAs, and then present our approach for finding DVAs. 

Na'ive approach I: PCA. The first naive approach is to apply PCA 
on a sample set of velocity points to find the DVAs. Using PCA 
to find DVAs is intuitive, since the 1 st PC (as described in Section 
2.2) represents the principal axis along which the data points lay. 
In our case, the data points are velocity points, therefore, the 1st PC 
represents the principal axis along which objects travel. However, 
this approach effectively combines the multiple DVAs in the data 
set into one average velocity axis, which does not represent any 
of the individual DVAs. PCA is only useful for finding the DVA 
when there is only one DVA in the data set. Figure 10(a) shows 
the result of applying PCA on a sample of 10,000 velocity points 
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Figure 11: Our partitioning algorithm being applied to the San Francisco data set shown in Figure 1 
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Figure 10: Result of applying the two naive approaches to find- 
ing the DVAs for the San Francisco data set 

of cars traveling on San Francisco network (shown in Figure 1). In 
this case, the data set has two DVAs but the 1st PC is the average 
of the two, instead of the two individual DVAs. The 1st PC is far 
from either of the DVAs. The 2nd PC is orthogonal to the 1st PC 
and also does not correspond to any of the DVAs. 

PC, 
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(a) Clustering using naive 
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Figure 12: Naive approach II versus our approach 

Naive approach II: fc-means clustering based on distance to 
centroid followed by PCA on each cluster. The second naive 
approach applies fc-means clustering to the velocity points based 
on distance to a cluster centroid and then use PCA on each resul- 
tant cluster to create one DVA per cluster. This does not work well 
since it groups objects based on their closeness to a point (cluster 
centroid) rather than closeness to an axis (dominant axis). Figure 
12(a) shows an example of clustering based on distance to cen- 
troid. In the example there are two cluster centroids Ci and C2 and 
two objects A and B. The direction of travel of object B is more 
aligned to C\ than C2, however the clustering algorithm groups ob- 
ject B with C2 since B is closer to C2. Similar observations can 
be made for object A. Figure 10(b) shows the resultant clusters 
and corresponding DVAs found on the San Francisco dataset when 
using fc-means clustering where distance to centroid is used as the 
distance measure. Note that the two DVAs found (two parallel lines 



in Figure 10(b) labeled as 1st PC of partition and 1) by this tech- 
nique do not resemble the two dominant axes (two axes with the 
highest concentration of data points) of the data set. The reason 
is the clusters created center around the cluster centroids shown in 
Figure 10(b) instead of the dominant axes. 

Our approach: fc-means clustering based on distance to the 1st 
PC of each cluster. In our approach, we use fc-means clustering on 
the velocity points, like the naive approach II, but we use the per- 
pendicular distance to the 1st PC of each cluster (partition) as the 
distance measure, instead of distance to a centroid. This allows ob- 
jects to be clustered based on their direction of travel. Figure 12(b) 
shows an example of using our clustering approach, where there are 
two clusters with their 1st PCs being PC\ and PC2, respectively. 
Our algorithm allocates object A to the cluster corresponding to 
PC2 because A has a shorter perpendicular distance to PC2. Sim- 
ilarly, object B is placed in the cluster corresponding to PCi. This 
assignment of objects to clusters makes sense since the direction of 
travel for object A is more aligned to PC2 than PC\, similarly for 
object B. 

Algorithm 2: FindDVAs(A, fc) 

Input: A: set of velocity points, fc: number of partitions 
Output: P: set of partitions with associated 1st PC 

1 let P be the set of fc partitions 

2 initialize each partition p £ P to be empty 

3 for each velocity point a £ A do 

4 I randomly assign a into a partition p £ P 

5 while at least one velocity point has moved into a different partition do 

6 compute the 1 st PC for each partition in P using PCA 

7 for each velocity point a £ A do 
if a is not currently in the partition whose 1st PC has the 
shortest distance from a then 

I move a into partition whose 1 st PC has the shortest 
|_ distance from a 

10 return P and associated 1 st PC as the DVA partitions and their 
associated DVAs 



Algorithm 2 shows precisely how our k-means clustering algo- 
rithm based on distance to the 1st PC is used to find DVAs. 

Figure 11 shows an example of applying the FindDVAs algo- 
rithm with fc = 2 to the San Francisco data set of Figure 1. Figure 
11(a) shows the initial random partitions and their corresponding 
1st PCs (Lines 3-4 and 6). Note that although the two initial par- 
titions are randomly created, their two 1st PCs are slightly apart. 
Next, Figure 11(b) shows the partitions created after reassigning 
velocity points to their closest 1st PCs. Note that after just this 1st 
reassignment iteration the partitions already closely resemble the 
final partitions shown in Figure 11(d). The reason for this is the 
reassignment of points amplifies the difference between the two 1st 
PCs by putting points that are slightly closer to one of the 1st PCs 
in the partition of that 1st PC. Figure 11(c) shows the updated 1st 
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PC of the partitions after reassigning velocity points (Line 6). The 
algorithm continues refining velocity points until they converge to 
the final partitions with their corresponding 1st PC (DVAs) shown 
in Figure 11(d). 

5.2 Velocity Analyzer: the Outlier Partition 
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(a) Transformed DVA partition (b) Final DVA partition after re- 
moving the outliers 

Figure 13: The transformed DVA partition and its final DVA 
partition after removing outliers 

Our aim is to have all objects within each partition travelling 
in a near ID space. However, from Figure 13(a) we can see that 
the data points when transformed into the coordinate space formed 
by DVA of Figure 11 do not travel in a near ID space, due to 
the presence of outlier objects. To moderate the influence of these 
objects, we place those data points with a perpendicular distance 
above a threshold r from their DVAs into the outlier partition. A 
cost analysis is performed upon each DVA partition separately to 
assign individual r values to each DVA partition. The outlier parti- 
tion is indexed in the standard coordinate system since the objects 
in it have little correlation with any DVAs. 

We determine the optimal r value using a slightly simplified ver- 
sion of the search space metric defined at the beginning of Section 
4. More specifically we use the minimum total rate of expansion of 
the area of the transformed leaf nodes A N i and A ^ of the DVA 
and outlier partitions, respectively. We use the same process as that 
shown at the beginning of Section 4 to transform the velocities of 
the queries into the tree nodes. This minimization metric captures 
the change in the search area as a function of time. We focus our 
analysis on leaf nodes since non-leaf nodes are typically cached in 
the RAM buffer, the majority of RAM buffer misses are due to leaf 
node accesses. 

For a given DVA partition and an outlier partition, we define the 
total rate of expansion of the area of the transformed leaf nodes of 
the two partitions as follows: 

TA(t, n d ) = L d A N , (t) + L a A N , (t) 



(n - n d ) 



— (d + 2v xmax t)(d + 2v y (n d )t) 



"1 



(d + 2v xma xt)(d + 2v y 



(8) 



where L d and L are the number of leaf nodes in the DVA and 
outlier partitions, respectively, n is the total number of objects in 
both partitions, n d is the number of objects in the DVA partition 
and 7i; is the average number of objects per leaf node. Figure 14 
illustrates the other terms used on the equation diagrammatically. 
The most important term is v Vd (n d ), since this is the term that cor- 
responds to the threshold value r. v yd (n d ) is the maximum speed 
along the y-axis in the DVA partition. v Vd (n d ) is a function of 
n d as we adjust v Vd (n d ) by removing from the DVA partition the 
objects whose y component speed is the highest. The remaining 
terms are described as follows, d is the length along both the x- and 
y-axes of both N' d and N' a . We use the same d for all side lengths 
because we assume uniform distribution of object locations. v xmax 
and v ymax are the maximum speed of N' a along the x- and y-axes, 
respectively. For simplicity, we also suppose that the maximum 
speed of N' d along the a>axis is also This approximation 



is reasonable since we partition solely based on the j/-axis maxi- 
mum speed and therefore we assume that the maximum speed of 
object movements along the x-axis is approximately the same for 
all partitions. 
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Figure 14: Diagram used to illustrate the terms used in Equa- 
tion 8 

Next, we take the derivative of TA(t, n d ) with respect to t to 
quantify the rate of expansion of TA(t, n d ): 

dTA(t,n d ) 2n d \,j, a +\\ 

, = (( v Vd( n d) - Vymax){d + Av xmax t)) 

dt ni 
2n 

+ (dljymax + V xmax (d -(- iVymaxt)^) (9) 

™; 

We need to minimize Equation 9 in order to minimize the rate 
of TA(t, n d ) expansion. The only components of the equation that 
are not constant are n d and v Vd (n d ). Therefore, minimizing Equa- 
tion 9 is same as minimizing the following expression: 



n d{vy d {n d ) - v ymax ) 



(10) 



Algorithm for determining optimal r value. To find the n d value 
that minimizes Equation 10 analytically, we would need to have an 
equation describing v Vd (n d ). However, it is hard to find a general 
form for the v yd (n d ) equation because it is data distribution de- 
pendent. Therefore, we use an equal width cumulative frequency 
histogram, per DVA partition, to capture the data distribution of 
v Vd (n d ). Each bucket of the histogram stores the number of veloc- 
ity points in the DVA whose maximum y speed is the corresponding 
y speed of the bucket. 

Our algorithm finds the r threshold, for each DVA partition, by 
taking a uniform sample of v Vd (n d ) values and computing the cor- 
responding Equation 10 value. The v Vd (n d ) value giving the min- 
imum value for Equation 10 is used as r. This approach incurs 
a small computational cost since Equation 10 is simple and can 
be computed cheaply. Figure 13(b) shows the final DVA partition 
after removing outliers from the transformed partition shown in 
Figure 13(a). 

Our experimental study (Section 6.1) shows that the algorithm 
proposed above is able to find a close to optimal perpendicular dis- 
tance r value for both the B^-tree and the TPR*-tree. 

5.3 Index Manager: Insertion, Deletion and 
Update 

The insertion algorithm is relatively straightforward. First, the 
algorithm finds the DVA index i m in whose perpendicular distance 
from the object o is the smallest. Then, if the perpendicular distance 
of o to i m in is larger than r, then o is inserted into the outlier index 
otherwise o is inserted into imin- Before an object is inserted into 
imin, o is first transformed into the coordinate space of imin using 
imin's 1st PC. The transformation process involves a simple matrix 
multiplication between the coordinates of o and the 1st PC of imin- 

When performing deletion, the algorithm first finds the partition 
object o resides in via a simple lookup table, and then uses the base 
index structure's deletion algorithm to delete the object from its 
partition. When an object changes its velocity, an update is per- 
formed on the index. 

An update simply consists of a deletion followed by an inser- 
tion. The updated object will be inserted into the closest DVA index 
which may be different from its original DVA index. If an update 
involves moving an object from one DVA index to another then 
both indexes need to be locked at the beginning of the update to en- 
sure a concurrent query on the destination index does not miss the 
inserted object. This may slightly increase the locking overhead. 
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5.4 Index Manager: Range Queries 



Algorithm 3: RangeQuery(7, q) 

Input: I: set of all indexes including both DVA indexes and the 

outlier index, q: range query 
Output: RS: result set 

1 for each index i 6 / do 

2 if i is a DVA index then 

3 transform the range of q to the coordinate space of index i 
using the 1st PC of i 

4 create transformed query q' consisting of a rectangular 
axis-aligned MBR of the transformed range of q 

5 else 

6 q' = q II index i is the outlier index 

7 execute range query q 1 on index i and store results in URS 

8 filter out the objects in U RS, which are not contained in q and 
add the remaining objects into RS 

9 return RS 



In this subsection, we present the range query algorithm, which 
can be used for both circular and rectangular range queries. Algo- 
rithm 3 details the steps the index manager uses to execute the range 
query. The index manager needs to query each of the indexes sep- 
arately and merge the results as the query region may encompass 
objects from different indexes. Before querying each DVA index, 
we need to first transform the query range into the coordinate space 
of the DVA index using the 1st PCs of the DVA index (Line 3). 
The transformation process involves simple matrix multiplication 
between the coordinates of the query range and that of the 1st PCs. 
The transformed ranges are bounded by a rectangular minimum 
bounding region (MBR), which is axis aligned with the coordinate 
space of the DVA indexes (Line 4). The transformed query is then 
executed on the indexes using the query algorithm of the underly- 
ing index, such as the B^-tree and the TPR*-tree (Line 7). Finally, 
the objects in the result are filtered to remove any objects, which 
are in the MBR of the transformed query but not be in the original 
query region (Line 8). Note that when querying the outlier index, 
there is no query transformation needed since the outlier index uses 
the standard coordinate system (Line 6). 

Figure 15(a) shows an example of a circular range query q with 
radius r before transforming into the coordinate space of a DVA 
index. It also represents the first and the 2nd PCs of the DVA index. 
Figure 15(b) shows the transformed query q' , which is bounded by 
an axis aligned MBR in the coordinate space of the DVA index 
formed by the 1st PCs. 




(a) Before transformation (b) After transformation 



Figure 15: Circular range query before and after transforming 
into a DVA index's coordinate space 

Our system supports all three query types described in Section 
2.1, namely the time slice range query, time interval range query, 
and moving range query. We discuss the moving range query since 
it is the most general form of the three query types. After trans- 
forming the range query into the transformed coordinate system 
and applying the filtering step (Line 9 of Algorithm 3), the same 
object containment relationship with the original query is retained. 
The query velocity can also be transformed into the new coordi- 
nate system and the query can be executed in the standard way. 
Thus, our system supports the same query types as the underlying 
indexes (the B^-tree/the TPR*-tree) including the three query types 
discussed in Section 2.1. 



5.5 Handling Changing Velocity Distributions 

In theory, if the dominant direction of object travel changes sig- 
nificantly we would need to rerun the velocity analyzer to deter- 
mine new DVAs, and then readjust the indexes to align with the 
new DVAs. However, we find in real life, the direction compo- 
nent of the velocity distribution changes little since the routes of 
the moving objects are usually fixed. This is intuitive as velocity 
distributions are usually dictated by rarely changing environmental 
factors, such as road networks, flight paths and shipping lanes, etc. 
Therefore, the dominant direction of object travel is likely to be sta- 
ble. However, the speed component of the velocity distribution is 
likely to change with time. For example, during the morning rush 
hour there will be many cars travelling into the city, resulting in 
reducing speed. In contrast, during this time, there will be few cars 
moving out of the city and they will be moving fast. The opposite 
is true during afternoon rush hour. The speed distribution has no 
effect on the coordinate system of the DVA indexes since the cars 
still travel along the same DVA. However, it does affect the value 
of the threshold r, since r is determined by the y-axis speed distri- 
bution of objects moving in the transformed coordinate system of 
the DVA indexes. We handle this situation by continuous updating 
the histogram used to determine r, and then periodically comput- 
ing an updated r. Computing r incurs only a small computational 
overhead because the equation used to derive it is simple. 

6. EXPERIMENTAL STUDY 

In this section, we report the results of experiments illustrating 
the performance of our VP technique applied to the B^-tree [13] 
and the TPR*-tree [23] against their unpartitioned counterparts. We 
firstly evaluate the ability of our algorithm to find the optimal r 
threshold value. Second, we measure the overhead incurred by the 
velocity analyzer. Third, we compare both the query and update 
performance of the algorithms across various data sets. Fourth, we 
compare the query performance of the algorithms for varying data 
sizes. Fifth, we measure the effect of varying the maximum speed 
of object movement. Sixth, we compare the query performance of 
the algorithms for varying query predictive time. Finally, we show 
representative results for the rectangular range query. 

The experiments were conducted based on the benchmark de- 
fined in Chen et al. [6] for evaluating moving object indexes. The 
road network and synthetic (uniform) data sets used in the exper- 
iments were generated using the benchmark's data generator pro- 
vided by Chen et al. [6]. To generate the road network data sets 
we fed the road network nodes and edges into the benchmark gen- 
erator. The road network nodes and edges were all generated us- 
ing the XML map data from the OpenStreetMap web site (Open- 
StreetMap.org). We generated four road network data sets. Their 
characteristics can be summarized as follows: 

• The New York (NY) and the Melbourne CBD (MEL) road 
networks contain the largest number of nodes and edges, and 
hence average the length of each edge. Therefore, both road 
networks have the highest update frequency. 

• Both the Chicago (CH) and the San Francisco (SA) road net- 
works contain less number of nodes and edges and hence 
both have smaller number of updates compared to the MEL 
and the NY networks. 

• The CH road network's velocity distribution is the most skew- 
ed, followed by the SA, the MEL and the NY road networks. 

We focus our experimental study on the circular time slice range 
query, with a future predictive time ranging from to 120 time 
stamps as described in Table 1 . We focus on the circular query be- 
cause it resembles many real world occurrences and is also used 
in the filter step of the k Nearest Neighbor query. The circular 
range query specifies a range, which is a certain distance from a 
point. For example, a taxi driver is interested in potential passen- 
gers within 200 meters of itself, or a tank wants to know if there are 
any other tanks within one kilometer of itself. We use the circular 
range query as the default query. We have performed the same set 
of experiments for the rectangular range query and the results are 
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(a) Melbourne CBD (b) New York CBD 



Figure 16: Other tested road networks 



Parameter 


Setting 


Space domain (m 2 ) 
Cardinality of objects 
Max. object speed (m/ts) 
Max update interval (ts) 
Range query radius (m) 
Query predictive time (ts) 
Time duration (ts) 
RAM buffer size (pages) 
Disk page size 
Data distribution 


100,000x100,000 
100K, 500K 
20, 100, 200 
120 

100,..., 500,. ..,1000 

0, 10, 60, 120 

240, 600 

50 

4KB 

CH, MEL, SA, NY, uniform 



Table 1: Parameters and their settings 



similar to those for the circular range query. We show representa- 
tive results for the rectangular range quer in Section 6.8. 

The parameters used in the experiments are summarized in table 
1, where values in bold denote the default values used. 

We compare our VP technique applied on top of two state-of-the- 
art moving object indexes of contrasting styles: the B^-tree [13] 
and the TPR*-tree [23] with their unpartitioned counterparts (in- 
dexes that has not been velocity partitioned). We used the source 
code for the TPR*-tree and the B^-tree provided by Chen et al. [6], 
All code was implemented in C++ under Microsoft Visual C++ 
2008 running on Microsoft Windows 7 Professional SP1. The al- 
gorithms compared are described as follows: 

• B^-tree. The B^-tree [13] has two time buckets and uses the 
Hilbert curve for space partitioning. We use the improved 
iterative expanding query algorithm [14] to reduce query en- 
largement. The histogram used contains 1000x1000 cells. 

• TPR*-tree. The TPR*-tree [23] is optimized for query size 
of 1000x1 000m 2 . 

• B a; (VP)-tree and TPR*(VP)-tree. The VP technique ap- 
plied to the B^-tree and the TPR*-tree denoted as B^VP)- 
tree and TPR*(VP)-tree, respectively. Both trees use a ve- 
locity histogram containing 100 buckets for determining r 
value. We set the number of DVA indexes to 2 because we 
found that in almost all road network data sets, the roads 
were aligned to two main axes. The settings for the un- 
derlying B^-tree and TPR*-tree are the same as above. The 
velocity analyzer used for both indexes used 10,000 sample 
velocity points. 

Our experiments measure the following metrics: average I/O per 
query; average I/O per update; average execution time per query; 
and average execution time per update. The execution time results 
include both CPU and I/O time. The update metric results are only 
reported for one experiment because this paper is focused on im- 
proving query performance. 

All experiments were conducted on a PC powered by Intel Core 
i7 CPU 2.8GHz with 8GB DDR3 main memory. 

6.1 Finding Optimal t Threshold 

In this experiment, we examine the effectiveness of our algo- 
rithm (see Subsection 5.2) at finding the optimal r threshold for 
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Figure 17: r algorithm versus varying fixed r threshold 

each index. As mentioned before r is used to determine which 
objects should be placed in the outlier index. We compared the 
B a: (VP)-tree and the TPR*(VP)-tree using different fixed r thresh- 
olds against the B a: (VP)-tree and the TPR*(VP)-tree automatically 
finding the optimal threshold value according to the algorithm of 
Section 5.2. We used both the CH and SA road network data sets 
for this experiment. The results are shown in Figure 17. In Figure 
17, the straight lines represent the B a: (VP)-tree and the TPR*(VP)- 
tree using the automatic algorithm for determining r and the curves 
represent the B a: (VP)-tree and the TPR*(VP)-tree using different 
fixed r thresholds. The results show that the VP technique is able 
to automatically compute a near optimal r threshold for both real 
data sets and moving object indexes. 

6.2 Velocity Analyzer Overhead 
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Figure 18: Overhead of velocity analyzer 

In this experiment, we measure the overhead of running our ve- 
locity analyzer as described in Sections 5.1 and 5.2. The velocity 
analyzer partitions the sample velocity points using a combination 
of PCA and k-means clustering to arrive at the DVA index bound- 
aries. We performed this experiment across the four road networks, 
CH, SA, MEL, NY and the uniform synthetic data set. We have run 
each data set five times and reported the average execution time. 
The results are shown in Figure 18. The results show that the over- 
head of the velocity analyzer over all tested data sets is low, taking 
between 50 milliseconds and 97 milliseconds. 

6.3 Effect of Varying Data Sets 

In this experiment, we compare the algorithms across the four 
road networks CH, SA, MEL, NY and the uniform synthetic data 
set. The query I/O and execution time results are shown in Figures 
19(a) and 19(b), respectively. The results show that the B a: (VP)- 
tree and the TPR*(VP)-tree consistently outperform their unpar- 
titioned counterparts for road network data sets. The query I/O 
performance improvement ranges from 280% for the B^-tree on 
the CH data set to 20% improvement for the TPR*-tree on the NY 
data set. The performance improvement is due to the fact the VP 
technique is able to exploit the presence of DVAs in these data sets. 

In general, the VP technique is able to improve the query perfor- 
mance of the B^-tree more than the TPR*-tree because the B^-tree 
does not attempt to group objects travelling in similar directions at 
all. In contrast, the insertion algorithm of the TPR*-tree attempts 
to group objects travelling in the same direction into the same tree 
node, albeit in a locally optimized way instead of the globally opti- 
mized way of the VP technique. Therefore, for the TPR*-tree, the 
performance advantage of using the VP technique is diminished. 
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Figure 19: Effect of varying data sets 

The results for the uniform data set show that the performance 
advantage of the B^CVPJ-tree and the TPR*(VP)-tree over their 
unpartitioned counterparts is removed. This is because in the uni- 
form data set there are no DVAs, and therefore nothing can be 
gained from partitioning the index by velocity distributions. In 
some cases, the B a: (VP)-tree performs slightly worse than the un- 
partitioned counterparts because of the overhead of maintaining 
multiple indexes and frequently computing an updated r threshold. 

The update I/O and execution time results for this experiment are 
shown in Figures 19(c) and 19(d), respectively. The TPR*(VP)- 
tree outperforms the TPR*-tree by up to a factor of 1.7 for aver- 
age update I/O cost and up to a factor of 1.9 for average execution 
time. This is because both the deletion and insertion algorithms 
of the TPR*-tree involve traversing the tree in a similar fashion 
to the query. Our algorithm is better at querying than the unpar- 
titioned TPR*-tree. This fact combined with the fact each of the 
partitioned indexes is smaller than the single unpartitioned TPR*- 
tree, explains the reason for the faster update performance of the 
TPR*(VP)-tree compared to the unpartitioned TPR*-tree. How- 
ever, the update performance of the B a: (VP)-tree and the unparti- 
tioned B^-tree are similar. This is because for the B^-tree the up- 
date performance is directly proportional to the height of the tree. 
The height of the B a: (VP)-tree and the unpartitioned B^-tree are the 
same in our experiments. In fact, the B^CVPJ-tree is slightly worse 
than the B^-tree for update performance due to the fact buffering is 
more effective when there are less trees and the B a: (VP)-tree needs 
to frequently compute an updated r threshold. 

For the remaining experiments, we only report query cost results 
and omit the update results because the technique proposed in this 
paper is mainly aimed at improving the query performance and also 
we have tight space limitations. 

6.4 Effect of Data Size on Range Query 
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Figure 20: Effect of data size on range query 



In this experiment, we examine the query performance of each 
index while varying the number of objects. As the data size grows, 
Figure 20 shows that the query performance increases approxi- 
mately linearly across all indexes. We observed that the B^-tree 
has the worst query performance and scales poorly with increasing 
number of objects. The results show that the B a: (VP)-tree is ef- 
fective at improving the performance of the unpartitioned B^-tree 
by up to as much as a factor of 3.4 for I/O and a factor of 2.8 for 
execution time. The performance improvement of TPR*(VP)-tree 
over the unpartitioned TPR*-tree is more modest at up to a factor 
of 1.8 for I/O and 1.9 for execution time. The reason for this is 
the same as explained in the previous section, namely the TPR*- 
tree already attempts to group objects moving in the same direction 
into the same tree node, whereas the B x -tree does not. 
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Effect of Maximum Object Speed on Range 
Query 
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Figure 21: Effect of maximum object speed on range query 

In this experiment, we study the effect of varying the maximum 
object speed on the query performance among all the indexes. Fig- 
ure 21 shows that the B^-tree suffers the most from increases in the 
maximum object speed and exhibits the steepest increase in both 
query I/O and query execution time. The reason is that it uses the 
maximum velocity when expanding queries. 

The results show that the VP technique is able to improve the 
performance of the unpartitioned indexes by an increasing margin 
as the maximum object speeds increases. This matches the analysis 
of Section 4. 

The B^CVPJ-tree outperforms the B^-tree by up to a factor of 3.4 
for average query I/O and up to a factor of 2.8 for query execution 
time. The TPR*(VP)-tree outperforms the TPR*-tree by up to a 
factor of 2 for average query I/O and up to a factor of 2. 1 for query 
execution time. 

6.6 Effect of Range Query Size on Range Query 
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Figure 22: Effect of range query size on range query 

In this experiment, we vary the radius of the range query. Re- 
sults in Figure 22 again show that the VP technique is more ef- 
fective at improving the performance of the B^-tree compared to 
the TPR*-tree. However, the relative performance difference be- 
tween the B a: (VP)-tree and the TPR*(VP)-tree and their unparti- 
tioned counterparts becomes relatively smaller in percentage terms. 
The reason for this is that as the query window gets larger the extent 
size of the query dominates over the query expansion due to the ob- 
ject velocities. The VP technique only reduces query expansion by 
partitioning the index according to object velocities and does not 
reduce the query extent size. 
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More specifically the results show that for a small query size 
(radius = 100m) the B a: (VP)-tree outperforms the B^-tree by up to 
a factor of 3.5 for query I/O and 2.8 for query execution time and 
the TPR*(VP)-tree outperforms the TPR*-tree by up to a factor of 
3.6 for query I/O and 3.8 for query execution time. 



6.7 



Effect of Query Predictive Time on Range 
Query 
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Figure 23: Effect of query predictive time on range query 

In this experiment, we vary the query predictive time from 20 
to 120 time stamps. This experiment is important since it demon- 
strates how well we can restrict the expansion of the search space 
as we query further into the future. The results in Figure 23 again 
show that the query performance of the B^-tree degrades much 
faster with increasing query predictive time than the other algo- 
rithms. Again the VP technique is able to make a large impact on 
improving the performance of the B^-tree compared to the TPR*- 
tree. The reasons are similar to the previous experiment, namely 
the B x -tree expands the query too much but this time due to a larger 
time value rather than velocity value. 

6.8 Effect of Query Predictive Time on Rect- 
angular Range Query 
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Figure 24: Effect of query predictive time on the rectangular 
range query 

As mentioned earlier, we have conducted the same set of exper- 
iments for the rectangular range query as the circular range query 
and the results were similar. However, due to space limitations we 
only show representative results for the rectangular range query. 
We choose to vary query predictive time experiment because it tests 
the ability of the algorithms to handle varying rates of query search 
space expansion. 

In this experiment, the rectangular range queries have side lengths 
of 1000x1 000m 2 . The results are almost the same as the results for 
the circular range query. 

7. CONCLUSION 

We have proposed the VP technique, a novel method that im- 
proves the moving object index performance by exploiting the skew 
of velocity distribution. The main idea is to partition objects based 
on their moving directions, and then use separate indexes to index 
the objects moving along different dominant velocity axes sepa- 
rately. We first provided analysis to show why this idea should 
work. Then, we proposed several algorithms to achieve effective 
velocity partitioning. The VP technique can be applied to most 
moving object index structures. Finally, we implemented it on two 
representative index structures, the TPR*-tree and the B^-tree and 



performed extensive experiments on both real and synthetic data 
sets. The results show that these index structures equipped with the 
VP technique outperform their original versions consistently. 
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